COVID-19 and Rates of Cancer Diagnosis in the US

Key Points Question To what extent did disruptions to the diagnosis of cancer in 2020 resolve during the second year of the COVID-19 pandemic in the US? Findings This population-based cross-sectional study found that rates of observed cancer diagnoses improved in 2021 but still remained a significant 2.7% lower than expected. Among screening-detected cancers, female breast cancer showed significant rate recovery in 2021, colorectal cancer returned to prepandemic trends, and significant reductions in diagnoses remained for lung and cervical cancers. Meaning This study suggests that cancer cases in the US continued to be underdiagnosed during the second year of the COVID-19 pandemic.


A. Joinpoint regression estimates.
Joinpoint regession estimates of expected incidence rates for 2020 and 2021 were produced based on Joinpoint regression analysis of annual incidence rate trends from 2000 to 2019. 1 In particular, for every cancer site of interest (with sex, race and ethnicity, age, urbanicity, or stage at diagnosis stratification, as applicable) we first fit a piecewise log-linear regression model to cancer incidence rate trends from 2000 to 2019 using the National Cancer Institute's Joinpoint software version 5.1.0.
Prospective models were constrained to at most two breaks (referred to as joinpoints) resulting in no more than three distinct linear segments, each of which was required to contain at least four years of data.Individual linear segments were fit using weighted least squares regression with inverse variance weights.The final piecewise regression model was selected based upon the least weighted Bayesian Information Criteria value and parametric 95% confidence intervals were determined. 2 Once a best Joinpoint regression model was determined, slope and intercept parameters for the linear model defining the final segment were extracted.This segment was guaranteed to cover observed incidence rates for the years of 2016-2019 at a minimum, and give the best estimate of trends in the years leading up to the start of the COVID-19 pandemic.We used these parameters to reconstruct the weighted linear regression model for the final segment in the R statistical programming language (version 4.3.2;R Foundation) and simulated 10,000 projections for expected incidence rates in 2020 and 2021.We estimated a final expected incidence rate for both years using the mean of the simulated projections and determined a 95% prediction interval (PI) by taking the 2.5 th and 97.5 th percentiles of simulated projections.

B. ARIMA estimates
The autoregressive and integrated moving average (ARIMA) estimates of expected incidence rates for 2020 and 2021 were developed according to the method described previously in Burus et al. 3 For this paper, we sought to estimate monthly age-adjusted incidence rates for January 2020-December 2021 in order to align with the annual age-adjusted estimates from the Joinpoint regression method.To accommodate for adding an additional 12 points to the projection, an additional 12 months of data (January-December 2017) were added to the pre-pandemic trends.We retained the same postulated pulse impact exogenous regressor as used previously for March-May 2020, and extended the postulated step change impact to include the entire 2021 calendar year.Extending the postulated step change impact allows for testing whether monthly incidence rates for June 2020-December 2021 returned to prepandemic trends.Additional exogenous regressors were considered to account for successive waves of COVID-19 infection during 2021, but were not incorporated due to a lack of well-defined periods and a desire to avoid overfitting models.

C. Ensemble estimates (linear pooling with equal weights)
We combined Joinpoint regression and ARIMA estimates to construct a final ensemble estimate of expected incidence rates using the method of linear pooling with equal weights. 4,5Ensemble point estimates for 2020 and 2021 were calculated as the mean of the combined set of 10,000 simulated Joinpoint projections and 10,000 simulated ARIMA projections.Ensemble 95% PI bounds were calculated as the 2.5 th and 97.5 th percentiles of the combined Joinpoint and ARIMA simulations.

D. Potentially undiagnosed cases sensitivity analysis
For additional interpretive context, estimates for potentially undiagnosed cancer cases were generated for some of our findings by multiplying the absolute differences between observed and expected incidence rates by the 2000 US Standard Population denominator of 274,633,642 (or 137,316,821 for sex-specific cancer sites).Though this calculation does not account for changes in the underlying age distribution of populations over time, the use of conservative 95% prediction intervals for estimates makes differences negligible.To demonstrate this, we performed a sensitivity analysis comparing our original estimates with estimates generated using our age-stratified findings for the age groups of under age 65 years and age 65 years or older, weighted according to the appropriate proportions of the 2000 US Standard Population (0.874 and 0.126, respectively).The results are presented in the following table.As expected, all adjusted estimates fall within the 95% prediction intervals of the original estimates:

eFigure 1 .eFigure 2 .eFigure 3 .eFigure 4 .eFigure 5 .eFigure 6 .
Percentage Difference in Observed From Expected All Sites Cancer Incidence Rates, by Population Subgroup and Period, 2020-2021.6Error bars indicate the 95% prediction interval (PI).(*) indicates significantly worse disruption than comparison group based on nonoverlapping 95% PIs.NH = Non-Hispanic; API = Asian and Pacific Islander.Percentage Difference in Observed From Expected Incidence Rates, by Urbanicity, Site, and Period, 2020-2021.6Error bars indicate the 95% prediction interval (PI).(*) indicates significantly worse disruption than comparison group based on non-overlapping 95% PIs.NOS = Not Otherwise Specified.Percentage Difference in Observed From Expected Incidence Rates, by Age, Site, and Period, 2020-2021.6Error bars indicate the 95% prediction interval (PI).(*) indicates significantly worse disruption than comparison group based on non-overlapping 95% PIs.NOS = Not Otherwise Specified.Percentage Difference in Observed From Expected Incidence Rates, by Race and Ethnicity, Site, and Period, 2020-2021.6Error bars indicate the 95% prediction interval (PI).(*) indicates significantly worse disruption than comparison group based on non-overlapping 95% PIs.NOS = Not Otherwise Specified; NH = Non-Hispanic.Percentage Difference in Observed From Expected Incidence Rates, by Sex, Site, and Period, 2020-2021.6Error bars indicate the 95% prediction interval (PI).(*) indicates significantly worse disruption than comparison group based on non-overlapping 95% PIs.Percentage Difference in Observed From Expected Incidence Rates, by Sex, Age, Site, and Period, 2020-2021.6Error bars indicate the 95% prediction interval (PI).(*) indicates significantly worse disruption than comparison group based on non-overlapping 95% PIs.eFigure 7. Monthly Age-Adjusted All Cancer Sites Incidence Rates, by Sex and Age, January 2017-December 2021. 6(A) Female, under age 65 years; (B) male, under age 65 years; (C) female, age 65 years or older; (D) male, age 65 years or older.Grey segment highlights March-April 2021.
Potentially Missed Cancer Cases, by Site and Period, 2020-2021, SEER 22 Potentially Missed Screening-Detected Cancer Cases by Stage at Diagnosis and Period, 2020-2021, SEER 22 Database, 2001-2021 6 Rates given per 100,000 people in the population and age-adjusted to the 2000 US Standard Population Abbreviations: PI = Prediction Interval; NOS = Not Otherwise Specified eTable 1. a Rates given per 100,000 people in the population and age-adjusted to the 2000 U.S. Standard Population Abbreviations: PI = Prediction Interval; NOS = Not Otherwise Specified.eTable2. a b Early stage at diagnosis defined as localized stage only c Late stage at diagnosis defined as regional and distant stage Abbreviations: PI = Prediction Interval